A Demographic-Conditioned Variational Autoencoder for fMRI Distribution Sampling and Removal of Confounds

Objective: fMRI and derived measures such as functional connectivity (FC) have been used to predict brain age, general fluid intelligence, psychiatric disease status, and preclinical neurodegenerative disease. However, it is not always clear that all demographic confounds, such as age, sex, and race, have been removed from fMRI data. Additionally, many fMRI datasets are restricted to authorized researchers, making dissemination of these valuable data sources challenging. Methods: We create a variational autoencoder (VAE)-based model, DemoVAE, to decorrelate fMRI features from demographics and generate high-quality synthetic fMRI data based on user-supplied demographics. We train and validate our model using two large, widely used datasets, the Philadelphia Neurodevelopmental Cohort (PNC) and Bipolar and Schizophrenia Network for Intermediate Phenotypes (BSNIP). Results: We find that DemoVAE recapitulates group differences in fMRI data while capturing the full breadth of individual variations. Significantly, we also find that most clinical and computerized battery fields that are correlated with fMRI data are not correlated with DemoVAE latents. An exception are several fields related to schizophrenia medication and symptom severity. Conclusion: Our model generates fMRI data that captures the full distribution of FC better than traditional VAE or GAN models. We also find that most prediction using fMRI data is dependent on correlation with, and prediction of, demographics. Significance: Our DemoVAE model allows for generation of high quality synthetic data conditioned on subject demographics as well as the removal of the confounding effects of demographics. We identify that FC-based prediction tasks are highly influenced by demographic confounds.


I. INTRODUCTION
F MRI measures the time-varying blood oxygen level- dependent (BOLD) signal in the brain in order to infer coarse-grained neuronal activation [1].It has traditionally been used to localize specific functions such as vision [2], emotion [3] [4] [5], attention [6] [7], and language [8] to discrete cortical areas.Functional connectivity (FC) is the temporal Pearson correlation of BOLD signal between different regions in the brain [9], and has been used to predict demographics such as age [10], sex [11] [12], and race [13], as well as clinical assessments for schizophrenia diagnosis [14] [15] and pre-clinical neurodegenerative disease [16].Many groups have also attempted to use FC as a biomarker to predict scholastic achievement or general fluid intelligence [17].Besides Perason correlation, several other metrics have been used for the calculation of FC, including partial correlation [18] and distance correlation [19].Others have experimented with calculating dynamic FC [15].With increasing clinical field strength, prediction based on FC or other fMRI metrics is poised to become increasingly important in the research and clinical settings [20].
Concurrently, generative models such as the DALL-E [21] and GPT [22] series have created textual and visual content that in many cases is indistinguishable from human-generated work [23].In the computer vision domain, generative models include generative adversarial networks (GANs) [24], variational autoencoders (VAEs) [25], and most recently, diffusionbased models [26].All of these methodologies have been applied to fMRI data for the improvement of predictive ability or for performing image to image translation [27] [28].However, previous work has mostly overlooked subject demographics as potential input in their generative models.Since fMRI data access is often restricted to national data repositories by qualified researchers, we believe that generative models that can produce synthetic data are useful to more easily disseminate information, but only if they can re-create the demographic distribution and individual variation found in the fMRI datasets.
It is well known that fMRI can be used to predict demographics such as age, sex, and race [10] [11] [13].It is also known that it is crucial to control for demographic confounds when performing statistical analysis [29] [30].In fact, many simpler models have provisions for regressing out confounds [31].There is a question, however, as to whether fMRIbased prediction using more complicated models is solely due to demographic signal present in fMRI [13].To this end, we present a new generative model based on a VAE that decorrelates latent features from subject demographics (DemoVAE).It accomplishes this by forcing such correlations to be zero during training and injecting demographic information in the decoder after calculation of the latent features.We add classifier and regression-guided loss functions [32] to ensure that synthetic samples contain demographics-associated features that are compatible with models trained on real data.We believe our model serves two purposes: 1) generation of representative synthetic data based on datasets that are not accessible to the general public, and 2) creation of fMRI latent features which are free from the confounding effects of demographics.It is also possible that DemoVAE can aid in data harmonization by removing site-specific effects [33] through treating site location as a demographic.These capabilities are validated on two large datasets accessible to qualified researchers.
The rest of this manuscript is organized as follows.Section II gives a recapitulation of the theory of the DemoVAE model and our specific training methodology, as well as a description of the datasets and experiments performed.Section III provides experimental results.Section IV discusses significant conclusions drawn from fMRI group differences found via DemoVAE and how they relate to existing work.Section V concludes with a summary of the work.We make the code publicly available at the link in the footnote 1 .An online demo is also available2 .

II. METHODS
First, we discuss the architecture and training of the De-moVAE model, shown in Figure 1.Next, we describe two datasets used for the validation of the model.Then, we outline experiments used to analyze DemoVAE's ability to decorrelate latent features from demographic confounds as well as to generate high quality synthetic fMRI data.Finally, we describe experiments using DemoVAE for imputation of fMRI data.

A. Variational Autoencoder
An autoencoder (AE) converts raw features into a lowerdimensional latent space via a learned encoder function z = E ϕ (x), along with a decoder function to convert the latent features back into a reconstructed version of the input x = D θ (z).The AE is often trained to minimize the difference between the reconstruction x and original input x.Thus, the AE may be seen as a nonlinear version of dimensionality reduction techniques such as PCA or Factor Analysis.
By contrast, a variational autoencoder (VAE) trains the encoder function E ϕ (x) to produce latent features that approximate a known probability distribution p θ (z), most often taken to be a standard multivariate Gaussian distribution p θ (z) = N (0, I) [25].This allows for artificially constructing latent samples z samp from the approximated distribution, followed by the conversion of those latents to samples of the original distribution p θ (x|z) by passing through the decoder function For the following, consider scalar features x and scalar latent features z.The exact calculation of p θ (z|x) is in most cases intractable, therefore an approximation q ϕ (z|x) ≈ p θ (z|x) is made, and the Kullback-Leibler (KL) divergence between the two distributions is taken: The evidence lower bound (ELBO) [25] is then defined as: Maximizing the ELBO is equivalent to maximizing the reconstruction probability lnp θ (x|z) while minimizing the KL divergence between our empirical and target distributions.Given a standard normal distribution for the latent features p θ (z) = N (0, 1), the ELBO objective to be minimized becomes: where µ z represents the mean of the the empirically calculated latent features, σ 2 z represents the variance of the same, and N represents the number of samples.This loss function can be seen to have three components: a reconstruction loss, two terms that tend to make σ z equal to one, and one term to make the expectation of the latent features equal to zero.Given this loss function, one is able to train a network to sample the distribution of FC data, but not to condition the samples on ancillary subject information such as demographics.
When considering a multivariate standard normal distribution p θ (z) = N (0, I) for the latent features, the KL divergence part takes the more complicated form [34]: This presents a challenge due to the calculation of, and backpropagation through, the log determinant of the empirical latent covariance matrix Σ z .We address this issue as part of our modifications to the VAE loss function presented in Section II-B.

B. Demographics-Conditioned and Decorrelated Variational Autoencoder (DemoVAE)
There is an existing body of academic literature [35] [36] as well as practical applications [37] exploring the conditioning of VAEs on user-specified inputs.VAEs have also been applied to the generation of synthetic fMRI data [28], but without considering patient demographics.In this work, we include the known patient demographic features as input to the decoder function x = D θ (z, y), where z are the latent features and y are the subject demographics.During training, we decorrelate the latent state z = E ϕ (x) from demographic features y so that all of the fMRI signal that can be attributed to demographics is based on user-provided input and not on the encoded latent features.To this end, we make several modifications to the traditional VAE loss function.
1) Incorporate Demographic Information: First, the reconstruction error term of the loss function remains unchanged from the ELBO formulation, except for the injection of demographic information: where N is the number of subjects, x i are the vectorized FC features, z i = E ϕ (x i ) are the empirically calculated latent features, and y i are the subject demographics for subject i.
2) Extension to Multidimensional Latent Space: Second, we note that the ELBO loss function of the standard VAE is applicable to scalar latent features z and not multi-dimensional latent features z ∈ R Nz .This may allow for a non-diagonal covariance matrix in the empirical distribution of latents q ϕ (z|x) = N (0, Σ).Thus, we modify a part of the ELBO loss function to specifically target a diagonal covariance matrix and zero expected value for the latents: where Z ∈ R Nz×N is the matrix of all N z latent features for all N subjects, z i is the vector of latent feature i for all N subjects, and µ zi is its mean.We find that this loss function performs as good or better than the KL divergence part of ELBO with fMRI data.
3) Decorrelate Latent Features from Demographics: Third, we add a term penalizing correlations between the empirical latent features and four demographic features or clinical outcomes: age, sex, race, and disease status (schizophrenia diagnosis).Where we have multiple fMRI scans using different scanner tasks for each subject, we also decorrelate the latents with respect to scanner task.We define where ρ zj ,y k is the correlation between between latent feature z j and demographic feature y k across all N subjects.
4) Classifier Guidance: Finally, while training the De-moVAE, we create synthetic samples based on random choices of demographic inputs, and penalize miss-predictions relative to pre-trained models.Given a single demographic prediction from a synthetic latent based on user-input demographics ŷi = f i (D θ (z samp , y)), we define where the models f i (•) are linear models trained on the ground truth fMRI subject data, y i,c is the one hot encoded true class label for demographic i, p i,c is the predicted probability for class c and demographic i, and the loss is the Mean Square Error (MSE) for continuous demographics (age) and Cross Entropy (CE) error for categorical demographics (sex, race, disease status, scanner task).
The final loss function for training the DemoVAE can thus be formulated as: where λ 1−4 are the hyperparameters chosen alongside learning rate and latent dimension size via random grid search.

C. Generation of Timeseries
The DemoVAE model described works on fixed length input vectors to produce fixed length latent feature vectors and fixed length samples from the distribution of the input.When creating fMRI-derived samples of synthetic FC, we can use the Cholesky decomposition [38] to generate variablelength BOLD timeseries that are compatible with the generated FC.These timeseries may then be used to generate alternate measures of connectivity, e.g., partial correlation-based FC (PCFC).
For fixed-length FC input, which is a symmetric positive semi-definite (PSD) matrix, we can train the decoder function X = D θ (z, y) to output a form that can be converted into a symmetric matrix.This can be either the unique upper triangular entries of the matrix XU or a low-rank factor of size A = R Nroi×Nr , where N roi is the number of regions of interest (ROIs) in the atlas and N r is the rank of the factor: Note that X(1) may contain a few negative eigenvalues while X(2) is most likely rank deficient, based on the choice of N r .However, the Cholesky decomposition requires a positive definite (PD) matrix as input.In the first case, we find a negligible loss of predictive ability by setting negative eigenvalues λ X,i of X to zero or a small positive value β.
We then choose the standard deviation of timeseries σ i ∈ N (µ σi , τ 2 σi ) at each ROI, and use it to recompute the covariance matrix Σ from the reconstructed or synthetic FC sample X: where σ is a column vector of standard deviations at each ROI and 1 is a column vector of ones.Given a rank-deficient but PSD covariance matrix, we can simulate a Cholesky-like decomposition using the eigenvectors and eigenvalues of Σ and QR decomposition in the following way: Timeseries may then be constructed based on the property of the Cholesky decomposition that a standard normal random variable X ∈ N (0, 1) multiplied by the Cholesky factor L creates a multivariate normal variable vector with zero mean and covariance matrix Σ = LL ⊤ .This is compatible with fMRI data, which are usually bandpass filtered prior to analysis.It assumes, however, that the timeseries BOLD signal is stationary, which is sufficient when producing correlationbased metrics.In Table V, we show that this method of generating PCFC via timeseries yields features that make accurate predictions using models trained on the original data, and vice-versa.

D. Datasets
We now describe two datasets used for validation and exploration of the DemoVAE model.Demographics for our subsets of the two data sources may be found in Table I.
1) Philadelphia Neurodevelopmental Cohort: The Philadelphia Neurodevelopmental Cohort (PNC) is a widely-used dataset of children and young adults with multi-task fMRI scans for 1,529 subjects [39] and genomic data for more than 9,000 [40], many of whom have both modalities.In addition, the PNC includes data for 169 questionnaire, computerized battery, and in-scanner task parameter fields [41] [42], not all of which are available for every subject.Scholastic achievement was measured using the Wide Range Achievement Test (WRAT) [43], with both a raw score and score with the effects of age regressed out.The dataset is enriched for subjects of European (EA) and African (AA) ancestry.fMRI scans include three in-scanner tasks: a resting state (rest), a working n-back memory (nback) [44], and an emotion identification task (emoid), where not all subjects have all three tasks.We selected a 1,154-subject subset of the entire cohort that included subjects with all three fMRI scanner tasks as well as single nucleotide polymorphism (SNP) data, and who belonged to either of the two predominant ancestry groups.
Acquisition [39] and preprocessing [13] of the fMRI data has been described previously [10], but was performed using a whole-body 3T scanner running an echo-planar imaging sequence with a repetition time of TR = 3sec.Data was pre-processed using SPM12 3 , including regression for motion correction, co-registration, and normalization to MNI space.The Power Atlas [45] of 5mm spherical regions was used to parcellate the fMRI BOLD images into 264 timeseries.FC was created from these timeseries via Pearson correlation.Partial correlation-based FC was created from these timeseries via the nilearn 4 software package [46] using the Ledoit-Wolf shrinkage estimator [47].
SNP data was collected using one of eight different platforms, each subject's data being handled using one of these platforms, with the largest platform annotating 1,185,051 SNPs.For our analysis, we chose a subset of 35,621 SNPs that were available on all 8 platforms for all subjects.SNPs were categorized by haplotypes as homozygous dominant, heterozygous, homozygous recessive, or missing.
We note the WRAT scores with age regressed out in Table I have not been adjusted for race, as seen from the very significant p-value, implying the possibility of confounding effects.Figure 2 displays the histogram of WRAT score among the two races.One of the goals of the DemoVAE model is to remove the effect of demographics from fMRI features in order to give an unconfounded view of the effect of brain network organization on phenotypic variables, e.g., removing the effects of demographics on scholastic achievement score.
2) Bipolar and Schizophrenia Network for Intermediate Phenotypes: As an additional and independent validation dataset containing clinical phenotypes, we use the Bipolar and Schizophrenia Network for Intermediate Phenotypes cohort of 933 patients, 1059 relatives, and 459 healthy controls [48].We selected a subset of 185 schizophrenia (SZ) patients and 220 healthy controls for whom we had fMRI scans, excluding subjects with borderline diagnosis such as bipolar and schizoaffective disorder.fMRI data was acquired over six sites, with acquisition and preprocessing of the data described elsewhere [49].In addition to the fMRI data, the BSNIP dataset contains 32 medication and clinical assessment measures related to patients' psychiatric condition.There is a clear demographic confound when predicting WRAT score from fMRI or genomic data.We show in Table II that DemoVAE is able to remove the effect of this confound, but at the same time, removes the ability to accurately predict WRAT score.

E. Experiments
which is heavily skewed according to ethnic group (Figure 2), using fMRI FC data, SNP data, scalar race indicator, and DemoVAE latents constructed from FC or SNPs.Ridge regression models were trained and evaluated on a set of 20 repetitions of an 80/20 train/test split with the above features, where the best value for the regularization parameter was chosen by random grid search.This experiment was performed to validate the ability of DemoVAE to decorrelate its latent features from demographics, and to demonstrate why demographic confounds in FC may be problematic for downstream analysis.

2) Validation of fMRI Samples Generated by DemoVAE:
Several tests were performed to validate that the samples created by DemoVAE accurately capture the distribution of fMRI data and recapitulate group differences between groups having different demographics.We first trained the DemoVAE model using the PNC dataset, including age, sex, and race as demographics, and with the scanner task being set to resting state.We also trained a traditional VAE using the traditional scalar ELBO objective in Equation 3 and no demographic information, as well as a Wasserstein generative adversarial network (W-GAN) model [24] [50] [27].Synthetic FC samples were then generated for 1,000 subjects using all three models, and the distribution of FC features was visualized in two dimensions using the scikit-learn implementation of tdistributed stochastic neighbor embedding (t-SNE) [51] [52].Subject demographics for the DemoVAE features were sampled randomly using an equally-weighted Bernoulli (sex, race) or normal (age) distribution.The distribution of synthetic data was compared with ground truth data.
Additionally, we measured the ability of DemoVAE synthetic data to recapitulate group differences in the PNC and BSNIP datasets.We calculated the mean difference in FC between young children and young adults, males and females, EA and AA race, and SZ patients and healthy controls using ground truth data.Then, we created synthetic FC data for those groups using using DemoVAE, and compared group differences of real and synthetic data.The RMSE between FC differences of real and synthetic data was calculated and compared with a null model.

3) Phenotype Prediction Using DemoVAE Synthetic Data:
The ability of DemoVAE to create synthetic data that recapitulate the demographic content of subject FC was tested by using real data to train demographic-prediction models that were tested on synthetic data and vice versa.The models used were Ridge regression models for continuous variables (age) and Logistic regression models for binary variables (sex, race, SZ diagnosis).The scikit-learn implementation of these models were employed and optimal regularization parameters were chosen using random grid search.Synthetic data was created using the same procedure as in Section II-E.2.Twenty repetition of each experiment was performed and the results averaged.The same number of synthetic subjects were created as available real subjects: 1,154 for the PNC dataset and 405 for BSNIP.

4) Correlation of Clinical Measures with DemoVAE Latents:
We tested the correlation of fMRI FC data with phenotype and clinical data fields before and after the removal of the confounding effects of demographics.Both the PNC and BSNIP dataset contain phenotype and clinical data which may be correlated with FC features.A subset of 169 phenotype, medication, and cognitive battery fields available in the PNC cohort was correlated with raw FC data, traditional VAE latents, and DemoVAE latents decorrelated from demographic features.Correlation was tested at a significance level of p < 0.05 and p < 0.01, and the number of significant correlations was determined.Significance was determined using a t-test with the statistic: where ρ was the correlation coefficient between FC or latent feature and clinical or computerized battery field and n was the number of samples, i.e., number of subjects having a value for that clinical or computerized battery field.Each FC, VAE, or DemoVAE feature was correlated independently and Bonferroni correction was applied to the p-value to correct for multiple comparisons.
In additional to the PNC clinical fields, the BSNIP dataset contained 32 demographic, clinical, and medication fields which were correlated with FC data and VAE latent features in a similar manner.Finally, the PNC dataset contained genomic data for a 1,154-subject subset of subjects with fMRI scans.These genomic data were also correlated with phenotype and cognitive battery fields before and after removal of confounding effects with DemoVAE.
5) Imputation of fMRI Scanner Task: DemoVAE creates latent features that are decorrelated from fMRI scanner task, and can generate samples conditioned on the type of scanner task.We therefore test the ability of DemoVAE to impute scanner task fMRI given fMRI from a different scanner task as input.Imputation was performed either deterministically, by switching the identity of the task y i the decoder D θ (z, y) was

III. RESULTS
This section presents results for the experiments described in Section II-E.

A. Prediction of WRAT Score from Decorrelated Latents
In Table II, we give results for predicting age-adjusted WRAT score in the PNC data from scalar race value, FC data, SNP data, DemoVAE latents derived from FC data, and DemoVAE latents derived from SNPs.We observe that using the scalar race variable yields the best prediction of standardized WRAT score.While FC and SNPs can predict WRAT score moderately well, that predictive ability disappears when latents are decorrelated from race, as in the DemoVAE latents.This demonstrates that DemoVAE is able to decorrelate the fMRI latent state from demographics.It also demonstrates that, while FC and SNPs have the ability to predict ageadjusted WRAT score, that prediction is based on ability to infer demographics, and not on any cognitive signal found in FC that is independent of demographics.We find, as have previous studies, that prediction of scholastic achievement may be highly confounded by race signal present in neuroimaging data [53] [13].

B. Validation of fMRI Samples Generated by DemoVAE
Figure 3 displays a selection of ground truth subject FC data compared to synthetic data generated by DemoVAE, a traditional VAE, and a W-GAN.We note that it is visually hard to distinguish between true subject data and synthetic data.However, this is not the case when comparing the entire distribution of data using t-SNE, as evident in Fig- ure 4. Figure 4 shows the distribution of synthetic DemoVAE data, VAE data, and W-GAN data transformed using t-SNE overlayed on ground truth resting state PNC subject data.DemoVAE data was created using randomly sampled age, sex,  and race demographics but with scanner task set to resting state.We see that DemoVAE captures the distribution of fMRI data better than the traditional VAE and W-GAN.It is evident that a GAN makes no guarantees about matching or even approximating the true distribution of data [34] unless additional regularization is performed.Figure 5 displays group differences between demographic subsets of real data compared to group differences from synthetic DemoVAE data.We see that by conditioning on demographic input, DemoVAE can produce samples that accurately recapitulate group differences in FC data.Table IV shows RMSE values for deviation in group differences between synthetic DemoVAE data and real data.is very high when training using real data and predicting using DemoVAE, and slightly lower when training using DemoVAE and predicting using real data, but still exceeds 90% in all but one instance.Pearson FC and partial correlation-based FC derived using the FC-based timeseries creation procedure described in Section II-C have similar accuracies.This validates our timeseries creation procedure, at least in the context of calculation of alternate measures of connectivity.

D. Correlation of Clinical Measures with DemoVAE Latents
Figure 6 displays the correlation between clinical questionnaire and computerized battery fields of the PNC and BSNIP datasets and fMRI FC data, traditional VAE latents, and demographically-unconfounded DemoVAE latents.We see that removing the effects of demographic confounds from either fMRI data or SNP data greatly reduces the number of fields that are significantly correlated with the fMRI or genomic data.In fact, of 169 clinical or computerized battery fields, only four remained significantly correlated at the p < 0.01 level after decorrelation from demographics.This result corroborates the result presented in Section III-A, where it was found that scalar race value was the best predictor of scholastic achievement as measured by WRAT score.While FC and SNPs were found to be somewhat predictive of WRAT score, that predictive ability disappeared when FC features were decorrelated from the demographics age, sex, and race using DemoVAE.
Unlike the PNC dataset, from which we used 169 questionnaire and computerized battery fields, the BSNIP dataset included a more modest 32 clinical fields available for analysis.All fields including descriptions are available at the GitHub repository accompanying this manuscript.When processing BSNIP data with DemoVAE, we used age, sex, race, and schizophrenia diagnosis as demographic variables to decorrelate latent features.Interestingly, the five BSNIP fields that remained correlated to DemoVAE latent features at a significance of p < 0.05 were related to medication (taking or not taking anti-psychotics, p < 0.0218) or Positive and Negative Syndrome Scale (PANSS) assessment as to the severity of schizophrenia symptoms [54].These included total positive symptom score (p < 0.0098), total negative symptom score (p < 0.0296), total general symptom score (p < 0.0011), and total PANSS score (p < 0.00033).This seems to imply that type or severity of schizophrenia symptoms [55] may have effects in fMRI data which are not accounted for by a simple binary diagnosis of the condition or demographics.III.

E. Imputation of fMRI Scanner Task
Table VI displays RMSEs when imputing FC from one fMRI scanner task, i.e., resting state, working memory, or emotion identification, to another.We find incorporating the average training set difference only improved RMSE marginally from simply reusing the input.Either an MLP or DemoVAE in deterministic mode, where only the scanner task "demographic" was changed in the decoder D θ (z, y), gave approximately the same RMSE, which was significantly better than adding the average of training set difference.By introducing 10% noise to the latent features created by the DemoVAE encoder E ϕ (x), the RMSE was significantly reduced compared to MLP or deterministic DemoVAE when taking the best of 10 samples.Interestingly, in this case the average error of the 10 samples was not increased significantly compared to MLP or DemoVAE in deterministic mode.These results suggest that there is a wide range of natural variability in FC, even when considering the same subject [56].

IV. DISCUSSION
Previous work has found group differences in FC between children and young adults as well as other demographic groups.Sanders et al. have identified that somatomotor-visual network resting state functional connectivity in the Human Connectome Project dataset [57] is most highly correlated with age of child or adolescent [58].Other investigators have found that somatomotor-visual network connectivity showed an increase in connectivity strength in a longitudinal subset of older adults [59] from the UK Biobank [60].These data support our finding, shown in Figure 5 (second FC matrices from left, leftmost red arrow), of large somatomotor-visual network connectivity differences between older and younger children.Ficek-Tani et al. have found sex-related differences in the default mode network (DMN), with females having higher intra-DMN connectivity and males having higher connectivity between DMN and other regions [61] [62].This finding is again reproduced by our own simple analysis of PNC data shown in Figure 5 (middle FC matrix).While the effects of ethnicity on fMRI have been less widely studied, it has been reported that race may have a large effect on the features of FC data [53] [13].Concerning schizophrenia, Li et al. have reported significant hypoconnectivities in multiple brain networks [63], including the somatomotor network, which aligns with our differential FC map in Figure 5 (far right FC matrix, left arrow).Bernard et al. specifically identified motor networks as contributing to schizophrenia endophenotype [64].The fact that our DemoVAE model is able to reproduce these group differences in synthetic data while capturing the wide variation in individual fMRI data (see Figure 4) makes it suitable for exploratory use by researchers who do not have permission or have not yet applied to access clinical fMRI datasets.This is further supported by the results, shown in Table V, that models trained on synthetic DemoVAE data perform comparably to models trained on real fMRI data.
Moreover, previous researchers [53] [13] have highlighted the possibility of prediction based on fMRI data being confounded by demographics, e.g.ethnicity in the prediction of scholastic achievement.Likewise, it is known that the prevalence of schizophrenia may be elevated in men compared to women [65], or at least that the age of onset of the disease tends to be different in men versus women [66].In fact, if not regressing out the effects of age, most measures of scholastic achievement would be highly confounded by children's grade level.We believe the ability to generate fMRI latent features where the confounding effects of demographics are removed may be a valuable addition to the analysis of fMRI FC data.Alternately, given the present finding of high confounding effects of demographics in FC data, it may be useful for researchers to begin to consider other and newer modalities, such as FNIRS [67], MEG [68], or electrode recordings [69] in addition to fMRI.
Although we find significant reductions in correlations with clinical questionnaire or computerized battery fields after removing the confounding effects of demographics with DemoVAE (see Figure 6), not all correlations seem to be based on demographic confounds.Among the fields that remained significantly correlated were antipsychotics medication use and four PANSS symptom severity fields in the BSNIP dataset.In fact, Sendi et al., among others [64], reported changes in FC correlate with schizophrenia symptoms [70].Additionally, Chopra et al. identified differential FC in schizophrenia patients taking antipsychotic medication compared to antipsychotic-naive patients [71].We believe the fact that DemoVAE quickly identifies clinical outcomes that are measurable by FC and unconfounded with respect to demographic information makes it a worthwhile contribution to the neuroimaging communities.

V. CONCLUSION
This paper proposes a new way to condition the wellknown VAE model on demographic information by decorrelating demographic information during VAE training and incorporating this information into the decoder stage.This method of conditioning and training creates synthetic samples that recapitulate both group differences as well as individual subject variation in FC.We show that DemoVAE outperforms a traditional VAE in capturing the whole distribution of fMRI data.It is shown that most clinical questionnaire and computerized battery fields that are correlated with fMRI features are in fact confounded by the ability of fMRI features to predict demographics.By contrast, our DemoVAE model shows that several clinical outcomes related to schizophrenia are independent of demographic features.We hope this finding can shed lights on the appropriate future use of demographic information in neuroimaging.

L
Fig. 1.Overview of the demographics-conditioned and decorrelated variational autoencoder (DemoVAE) model.Instead of reconstruction based only on latent features z = E ϕ (x), the DemoVAE model uses demographics y as input to the decoder x = D θ (z, y).The two main uses of the model are inference, which generates latent features z decorrelated from demographics, and sampling, which generates synthetic fMRI data conditioned on user-provided demographics.

Fig. 2 .
Fig.2.Histogram of standardized (age-correct) WRAT score from the PNC dataset, split among the two major race groups in the dataset.There is a clear demographic confound when predicting WRAT score from fMRI or genomic data.We show in TableIIthat DemoVAE is able to remove the effect of this confound, but at the same time, removes the ability to accurately predict WRAT score.

Fig. 3 .
Fig. 3. Sampled FC matrices for real PNC resting state scans (top) compared to synthetic DemoVAE, VAE, and W-GAN FC data.Visually, all synthetic models generate convincing data.
of Real vs Synthetic FC Samples

Fig. 4 .Fig. 5 .
Fig. 4. t-SNE embeddings of synthetic FC data from DemoVAE, traditional VAE, and W-GAN models overlayed on top of t-SNE embeddings of real resting state FC data from the PNC dataset.Blue circles represent embeddings of real subject FC data while orange crosses represent embeddings of synthetic data.We see that DemoVAE captures the distribution of fMRI FC data as well as or better than a traditional VAE and better than a GAN.

TABLE I DEMOGRAPHICS
FOR THE PNC AND BSNIP DATASETS.

TABLE II RMSES
(MEAN AND STANDARD DEVIATION) OF PREDICTING STANDARDIZED WRAT SCORES USING FMRI FC INPUT, SNP INPUT, DEMOVAE FMRI LATENTS, DEMOVAE SNP LATENTS, AND SCALAR RACE VARIABLE.
Table V shows the predictive RMSE and accuracy when training models on real fMRI data and predicting on synthetic DemoVAE data and vice versa.Predictive tasks include age, sex, race, and SZ diagnosis prediction.The predictive accuracy

TABLE IV RMSES
BETWEEN FC GROUP DIFFERENCES USING REAL VERSUS SYNTHETIC DEMOVAE DATA.

TABLE V TRANSFER
OF MODELS BETWEEN FMRI AND VAE.RMSE (AGE PREDICTION) AND MEAN ACCURACY (SEX, RACE, AND SCHIZOPHRENIA PREDICTION) FOR MLP MODELS TRAINED ON GROUND TRUTH FMRI DATA AND TESTED ON DEMOVAE GENERATED SAMPLES AND VICE VERSA.FC=PEARSON FUNCTIONAL CONNECTIVITY, PCFC=PARTIAL CORRELATION-BASED FUNCTIONAL CONNECTIVITY Fig. 6.Correlation of questionnaire, computerized battery, and clinical fields with fMRI FC data versus traditional VAE or decorrelated DemoVAE latent features.Top: PNC dataset fMRI data, left bottom: PNC dataset SNP data, right bottom: BSNIP dataset fMRI data.There were a total of 169 fields in the PNC dataset and 32 in the BSNIP dataset.We see that both FC and traditional VAE latents, which are confounded by patient demographics, have significant correlations with more than half of all fields.Once demographic confounds are removed with DemoVAE, however, both FC and SNP data are significantly correlated with only a small percentage of fields.A list of the fields used is available in the GitHub repository.Blue color=correlation with FC features, Red color=correlation with regular VAE latents, Orange color=correlation with DemoVAE latents TABLE VI RMSES (MEAN AND STANDARD DEVIATION) FOR THE RECONSTRUCTION OF ONE TASK FC FROM ANOTHER SCANNER TASK IN THE TEST SET, USING MLP MODEL, MEAN DIFFERENCE ON TRAINING SET, AND DEMOVAE.DEMOVAE IS USED IN DETERMINISTIC MODE AND USING BEST AND AVERAGE OF 10 SAMPLES ADDING 10% NOISE IN THE LATENT DIMENSION.THE ABILITY OF THE DEMOVAE TO SAMPLE THE DISTRIBUTION IN THE LATENT SPACE ALLOWS IT TO GENERATE MORE ACCURATE SAMPLES WHEN THE TRANSFER FUNCTION IS NON-DETERMINISTIC.